Topic-based language models using EM
نویسندگان
چکیده
In this paper, we propose a novel statistical language model to capture topic-related long-range dependencies. Topics are modeled in a latent variable framework in which we also derive an EM algorithm to perform a topic factor decomposition based on a segmented training corpus. The topic model is combined with a standard language model to be used for on-line word prediction. Perplexity results indicate an improvement over previously proposed topic models, which unfortunately has not translated into lower word error.
منابع مشابه
A Comparison of Discriminative EM-Based Semi-Supervised Learning algorithms on Agreement/Disagreement classification
Recently, semi-supervised learning has been an active research topic in the natural language processing community, to save effort in hand-labeling for data-driven learning and to exploit a large amount of readily available unlabeled text. In this paper, we apply EM-based semi-supervised learning algorithms such as traditional EM, co-EM, and cross validation EM to the task of agreement/disagreem...
متن کاملSemantics-based Language Models for Information Retrieval and Text Mining
Semantics-based Language Models for Information Retrieval and Text Mining Xiaohua Zhou Xiaohua Hu The language modeling approach centers on the issue of estimating an accurate model by choosing appropriate language models as well as smoothing techniques. In the thesis, we propose a novel context-sensitive semantic smoothing method referred to as a topic signature language model. It extracts exp...
متن کاملA novel language model based on self-organized learning
Statistical language model is very important to speech recognition. To a system of special topic, domain dependent language model is much better than general model. There are two problems in traditional method to train topic dependent model: 1. The corpus of special topic is not as enough as general corpus. 2. An individual article always relates to more than one topics, traditional method has ...
متن کاملAdvertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملLatent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models
Topic models have become increasingly prominent text-analytic machine learning tools for research in the social sciences and the humanities. In particular, custom topic models can be developed to answer specific research questions. The design of these models requires a nontrivial amount of effort and expertise, motivating general-purpose topic modeling frameworks. In this paper we introduce lat...
متن کامل